Goto

Collaborating Authors

 Sea of Marmara


Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection

Padovese, Bruno, Frazao, Fabio, Dowd, Michael, Joy, Ruth

arXiv.org Artificial Intelligence

Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic complexity of real-world marine environments. Data augmentation has proven to be an effective strategy to address this limitation by increasing dataset diversity and improving model generalization without requiring additional field data. However, most augmentation techniques used to date rely on effective but relatively simple transformations, leaving open the question of whether deep generative models can provide additional benefits. In this study, we evaluate the potential of deep generative for data augmentation in marine mammal call detection including: Variational Autoencoders, Generative Adversarial Networks, and Denoising Diffusion Probabilistic Models. Using Southern Resident Killer Whale (Orcinus orca) vocalizations from two long-term hydrophone deployments in the Salish Sea, we compare these approaches against traditional augmentation methods such as time-shifting and vocalization masking. While all generative approaches improved classification performance relative to the baseline, diffusion-based augmentation yielded the highest recall (0.87) and overall F1-score (0.75). A hybrid strategy combining generative-based synthesis with traditional methods achieved the best overall performance with an F1-score of 0.81. We hope this study encourages further exploration of deep generative models as complementary augmentation strategies to advance acoustic monitoring of threatened marine mammal populations.


RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

Yu, Zichun, Xiong, Chenyan

arXiv.org Artificial Intelligence

High-quality pretraining data is the fossil fuel of large language models (LLMs), yet its reserves are running low for frontier models. In this paper, we introduce RePro, a novel web recycling method that trains a relatively small LM with reinforcement learning to generate effective and faithful rephrasings of pretraining data. Specifically, we design one quality reward and three faithfulness rewards, optimizing the LM rephraser to convert organic data into high-quality rephrasings while maintaining its core semantics and structure. In our experiment, we train a 4B rephraser to recycle 72B tokens sampled from DCLM-RefinedWeb. Pretraining results on 400M and 1.4B models demonstrate that RePro delivers 4.7%-14.0% relative accuracy gains over organic-only baseline on 22 downstream tasks. RePro also outperforms ReWire, the state-of-the-art web recycling method that prompts a 70B rephraser, as well as the organic baseline with a 4x larger data pool. Experiments with different amounts of recycled data highlight that RePro improves organic data efficiency by 2-3x. Individual and distributional analyses validate that RePro preserves more critical information and faithfully reflects the characteristics of organic data compared to prompting-based methods. Together, these results show that RePro provides an efficient and controllable path to effectively harness the fossil fuel of LLM pretraining. We open-source our code, rephraser, and recycled data at https://github.com/cxcscmu/RePro.


MedFormer: a data-driven model for forecasting the Mediterranean Sea

Epicoco, Italo, Donno, Davide, Accarino, Gabriele, Norberti, Simone, Grandi, Alessandro, Giurato, Michele, McAdam, Ronan, Elia, Donatello, Clementi, Emanuela, Nassisi, Paola, Scoccimarro, Enrico, Coppini, Giovanni, Gualdi, Silvio, Aloisio, Giovanni, Masina, Simona, Boccaletti, Giulio, Navarra, Antonio

arXiv.org Artificial Intelligence

Accurate ocean forecasting is essential for supporting a wide range of marine applications. Recent advances in artificial intelligence have highlighted the potential of data-driven models to outperform traditional numerical approaches, particularly in atmospheric weather forecasting. However, extending these methods to ocean systems remains challenging due to their inherently slower dynamics and complex boundary conditions. In this work, we present MedFormer, a fully data-driven deep learning model specifically designed for medium-range ocean forecasting in the Mediterranean Sea. MedFormer is based on a U-Net architecture augmented with 3D attention mechanisms and operates at a high horizontal resolution of 1/24°. The model is trained on 20 years of daily ocean reanalysis data and fine-tuned with high-resolution operational analyses. It generates 9-day forecasts using an autoregressive strategy. The model leverages both historical ocean states and atmospheric forcings, making it well-suited for operational use. We benchmark MedFormer against the state-of-the-art Mediterranean Forecasting System (MedFS), developed at Euro-Mediterranean Center on Climate Change (CMCC), using both analysis data and independent observations. The forecast skills, evaluated with the Root Mean Squared Difference and the Anomaly Correlation Coefficient, indicate that MedFormer consistently outperforms MedFS across key 3D ocean variables. These findings underscore the potential of data-driven approaches like MedFormer to complement, or even surpass, traditional numerical ocean forecasting systems in both accuracy and computational efficiency.


Ovis2.5 Technical Report

Lu, Shiyin, Li, Yang, Xia, Yu, Hu, Yuwei, Zhao, Shanshan, Ma, Yanqing, Wei, Zhichao, Li, Yinglun, Duan, Lunhao, Zhao, Jianshan, Han, Yuxuan, Li, Haijun, Chen, Wanying, Tang, Junke, Hou, Chengkun, Du, Zhixing, Zhou, Tianli, Zhang, Wenjie, Ding, Huping, Li, Jiahe, Li, Wen, Hu, Gui, Gu, Yiliang, Yang, Siran, Wang, Jiamang, Sun, Hailong, Wang, Yibo, Sun, Hui, Huang, Jinlong, He, Yuping, Shi, Shengze, Zhang, Weihong, Zheng, Guodong, Jiang, Junpeng, Gao, Sensen, Wu, Yi-Feng, Chen, Sijia, Chen, Yuhui, Chen, Qing-Guo, Xu, Zhao, Luo, Weihua, Zhang, Kaifu

arXiv.org Artificial Intelligence

We present Ovis2.5, a successor to Ovis2 designed for native-resolution visual perception and strong multimodal reasoning. Ovis2.5 integrates a native-resolution vision transformer that processes images at their native, variable resolutions, avoiding the degradation from fixed-resolution tiling and preserving both fine detail and global layout -- crucial for visually dense content like complex charts. To strengthen reasoning, we train the model to move beyond linear chain-of-thought and perform reflection -- including self-checking and revision. This advanced capability is exposed as an optional "thinking mode" at inference time, allowing users to trade latency for enhanced accuracy on difficult inputs. The model is trained via a comprehensive five-phase curriculum that progressively builds its skills. The process begins with foundational visual and multimodal pretraining, advances through large-scale instruction tuning, and culminates in alignment and reasoning enhancement using DPO and GRPO. To scale these upgrades efficiently, we employ multimodal data packing and hybrid parallelism, yielding a significant end-to-end speedup. We release two open-source models: Ovis2.5-9B and Ovis2.5-2B. The latter continues the "small model, big performance" philosophy of Ovis2, making it ideal for resource-constrained, on-device scenarios. On the OpenCompass multimodal leaderboard, Ovis2.5-9B averages 78.3, marking a substantial improvement over its predecessor, Ovis2-8B, and achieving state-of-the-art results among open-source MLLMs in the sub-40B parameter range; Ovis2.5-2B scores 73.9, establishing SOTA for its size. Beyond aggregate scores, Ovis2.5 achieves leading results on STEM benchmarks, exhibits strong capabilities on grounding and video tasks, and achieves open-source SOTA at its scale for complex chart analysis.


Regional Ocean Forecasting with Hierarchical Graph Neural Networks

Holmberg, Daniel, Clementi, Emanuela, Roos, Teemu

arXiv.org Artificial Intelligence

Accurate ocean forecasting systems are vital for understanding marine dynamics, which play a crucial role in environmental management and climate adaptation strategies. Traditional numerical solvers, while effective, are computationally expensive and time-consuming. Recent advancements in machine learning have revolutionized weather forecasting, offering fast and energy-efficient alternatives. Building on these advancements, we introduce SeaCast, a neural network designed for high-resolution, medium-range ocean forecasting. SeaCast employs a graph-based framework to effectively handle the complex geometry of ocean grids and integrates external forcing data tailored to the regional ocean context. Our approach is validated through experiments at a high spatial resolution using the operational numerical model of the Mediterranean Sea provided by the Copernicus Marine Service, along with both numerical and data-driven atmospheric forcings.


OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning

Lu, Bin, Zhao, Ze, Han, Luyu, Gan, Xiaoying, Zhou, Yuntao, Zhou, Lei, Fu, Luoyi, Wang, Xinbing, Zhou, Chenghu, Zhang, Jing

arXiv.org Artificial Intelligence

Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precise reconstruction. In this work, we propose OxyGenerator, the first deep learning based model, to reconstruct the global ocean deoxygenation from 1920 to 2023. Specifically, to address the heterogeneity across large temporal and spatial scales, we propose zoning-varying graph message-passing to capture the complex oceanographic correlations between missing values and sparse observations. Additionally, to further calibrate the uncertainty, we incorporate inductive bias from dissolved oxygen (DO) variations and chemical effects. Compared with in-situ DO observations, OxyGenerator significantly outperforms CMIP6 numerical simulations, reducing MAPE by 38.77%, demonstrating a promising potential to understand the "breathless ocean" in data-driven manner.


A novel interface for adversarial trivia question-writing

Liu, Jason

arXiv.org Artificial Intelligence

A critical component when developing question-answering AIs is an adversarial dataset that challenges models to adapt to the complex syntax and reasoning underlying our natural language. Present techniques for procedurally generating adversarial texts are not robust enough for training on complex tasks such as answering multi-sentence trivia questions. We instead turn to human-generated data by introducing an interface for collecting adversarial human-written trivia questions. Our interface is aimed towards question writers and players of Quiz Bowl, a buzzer-based trivia competition where paragraph-long questions consist of a sequence of clues of decreasing difficulty. To incentivize usage, a suite of machine learning-based tools in our interface assist humans in writing questions that are more challenging to answer for Quiz Bowl players and computers alike. Not only does our interface gather training data for the groundbreaking Quiz Bowl AI project QANTA, but it is also a proof-of-concept of future adversarial data collection for question-answering systems. The results of performance-testing our interface with ten originally-composed questions indicate that, despite some flaws, our interface's novel question-writing features as well as its real-time exposure of useful responses from our machine models could facilitate and enhance the collection of adversarial questions. The code for our interface is available at: https://github.com/Zefan-Cai/QAML


Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

Höhl, Adrian, Obadic, Ivica, Torres, Miguel Ángel Fernández, Najjar, Hiba, Oliveira, Dario, Akata, Zeynep, Dengel, Andreas, Zhu, Xiao Xiang

arXiv.org Artificial Intelligence

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the used explainable AI methods and their objectives, findings, and challenges in Remote Sensing applications is still missing. In this paper, we address this issue by performing a systematic review to identify the key trends of how explainable AI is used in Remote Sensing and shed light on novel explainable AI approaches and emerging directions that tackle specific Remote Sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights in Remote Sensing, and reflect on the approaches used for explainable AI methods evaluation. Our review provides a complete summary of the state-of-the-art in the field. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field of explainable AI in Remote Sensing.


War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars

Hua, Wenyue, Fan, Lizhou, Li, Lingyao, Mei, Kai, Ji, Jianchao, Ge, Yingqiang, Hemphill, Libby, Zhang, Yongfeng

arXiv.org Artificial Intelligence

Can we avoid wars at the crossroads of history? This question has been pursued by individuals, scholars, policymakers, and organizations throughout human history. In this research, we attempt to answer the question based on the recent advances of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose \textbf{WarAgent}, an LLM-powered multi-agent AI system, to simulate the participating countries, their decisions, and the consequences, in historical international conflicts, including the World War I (WWI), the World War II (WWII), and the Warring States Period (WSP) in Ancient China. By evaluating the simulation effectiveness, we examine the advancements and limitations of cutting-edge AI systems' abilities in studying complex collective human behaviors such as international conflicts under diverse settings. In these simulations, the emergent interactions among agents also offer a novel perspective for examining the triggers and conditions that lead to war. Our findings offer data-driven and AI-augmented insights that can redefine how we approach conflict resolution and peacekeeping strategies. The implications stretch beyond historical analysis, offering a blueprint for using AI to understand human history and possibly prevent future international conflicts. Code and data are available at \url{https://github.com/agiresearch/WarAgent}.


Self-Prompting Large Language Models for Zero-Shot Open-Domain QA

Li, Junlong, Zhang, Zhuosheng, Zhao, Hai

arXiv.org Artificial Intelligence

Open-Domain Question Answering (ODQA) aims at answering factoid questions without explicitly providing specific background documents. In a zero-shot setting, this task is more challenging since no data is available to train customized models like Retriever-Readers. Recently, Large Language Models (LLMs) like GPT-3 have shown their power in zero-shot ODQA with direct prompting methods, but these methods are still far from releasing the full powerfulness of LLMs only in an implicitly invoking way. In this paper, we propose a Self-Prompting framework to explicitly utilize the massive knowledge stored in the parameters of LLMs and their strong instruction understanding abilities. Concretely, we prompt LLMs step by step to generate multiple pseudo QA pairs with background passages and explanations from scratch and then use those generated elements for in-context learning. Experimental results show our method surpasses previous SOTA methods significantly on three widely-used ODQA datasets, and even achieves comparable performance with some Retriever-Reader models fine-tuned on full training data.